We propose using five data-driven community detection approaches from socialnetworks to partition the label space for the task of multi-labelclassification as an alternative to random partitioning into equal subsets asperformed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector,infomap, walktrap and label propagation algorithms. We construct a labelco-occurence graph (both weighted an unweighted versions) based on trainingdata and perform community detection to partition the label set. We includeBinary Relevance and Label Powerset classification methods for comparison. Weuse gini-index based Decision Trees as the base classifier. We compare educatedapproaches to label space divisions against random baselines on 12 benchmarkdata sets over five evaluation measures. We show that in almost all cases seveneducated guess approaches are more likely to outperform RAkELd than otherwisein all measures, but Hamming Loss. We show that fastgreedy and walktrapcommunity detection methods on weighted label co-occurence graphs are 85-92%more likely to yield better F1 scores than random partitioning. Infomap on theunweighted label co-occurence graphs is on average 90% of the times better thanrandom paritioning in terms of Subset Accuracy and 89% when it comes to Jaccardsimilarity. Weighted fastgreedy is better on average than RAkELd when it comesto Hamming Loss.
展开▼